-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Support planning subqueries with OuterReferenceColumn belongs to non-adjacent outer relations #19930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Projection: l.a, l.b, l.c, l.d, l.e | ||
| SubqueryAlias: l | ||
| Projection: lineitem.l_item_id AS a, lineitem.l_description AS b, lineitem.price AS c | ||
| Projection: lineitem.l_orderkey AS a, lineitem.l_item_id AS b, lineitem.l_description AS c, lineitem.l_extendedprice AS d, lineitem.price AS e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the lineitem table is extended now.
| "o1.o_custkey".to_string(), | ||
| "o1.o_orderstatus".to_string(), | ||
| "o1.customer_id".to_string(), | ||
| "o1.o_totalprice".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the order table is extended now.
| assert_snapshot!( | ||
| err.strip_backtrace(), | ||
| @"Error during planning: Source table contains 3 columns but only 2 names given as column alias" | ||
| @"Error during planning: Source table contains 5 columns but only 2 names given as column alias" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the lineitem table is extended now.
| ])), | ||
| "orders" => Ok(Schema::new(vec![ | ||
| Field::new("order_id", DataType::UInt32, false), | ||
| Field::new("o_orderkey", DataType::UInt32, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These extensions were necessary to get the tests to work.
## Which issue does this PR close? While working on #19930 , I noticed a trailing whitespace in the CROSS JOIN logical plan output. This whitespace is inconsistent with the rest of the logical plan formatting. ## Rationale for this change This change removes the unnecessary trailing whitespace in the logical plan output of a CROSS JOIN. ## What changes are included in this PR? ## Are these changes tested? Yes. ## Are there any user-facing changes? No.
99902bc to
3b1fa64
Compare
|
@alamb @duongcongtoai Could you review this when you have a moment so we can get it merged? |
af87772 to
cd81a28
Compare
|
nice, i'll take a look once i have time |
| fn table_with_column_alias() { | ||
| let sql = "SELECT a, b, c | ||
| FROM lineitem l (a, b, c)"; | ||
| let sql = "SELECT a, b, c, d, e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have a sqllogictest for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see you replaced them with integration test instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should have some sqllogictest coverage to make sure everything is hooked up right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two sqllogictests back which pass without the optimizer changes, the other two remain in sql_integration.rs because they depend on the optimizer change.
| // TODO revisit this validation logic | ||
| plan_err!( | ||
| "Correlated scalar subquery in the GROUP BY clause must also be in the aggregate expressions" | ||
| "Correlated scalar subquery in the GROUP BY clause must \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this change is unrelated (it's in my original PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from fmt.
|
@duongcongtoai Thank you for your feedback! |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mkleen and @duongcongtoai
I think this looks pretty good to me -- I have some small suggestions for API improvements, but the basic idea is 🏆
It would also be nice to restore one or two slt tests to make sure this code is connected end to end
| fn table_with_column_alias() { | ||
| let sql = "SELECT a, b, c | ||
| FROM lineitem l (a, b, c)"; | ||
| let sql = "SELECT a, b, c, d, e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should have some sqllogictest coverage to make sure everything is hooked up right
| /// The query schema of the outer query plan, used to resolve the columns in subquery | ||
| outer_query_schema: Option<DFSchemaRef>, | ||
|
|
||
| /// The queries schemas of outer query relations, used to resolve the outer referenced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect
|
@alamb Thank you for the review. I will add your feedback soon! |
|
Thanks @mkleen -- sorry for the delay in reviewing this |
8899f47 to
5328de1
Compare
9ba50c7 to
42170bf
Compare
|
@alamb @duongcongtoai Could you please do one more review? This is the interesting change 818926d |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good and we should merge this in to keep things moving. I have some ideas on how to improve it but will do as a follow on PR
Thank you @mkleen and @duongcongtoai
| // Check the outer query schema | ||
| if let Some(outer) = planner_context.outer_query_schema() | ||
| && let Ok((qualifier, field)) = | ||
| for outer in planner_context.outer_queries_schemas() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon a second look at this code, I think this is resolving the names in the wrong order (starts at outermost query, but really should start at the innermost) -- something like outer in planner_context.outer_queries_schemas().iter().rev()
I'll try and come up with a reproducer for this one
datafusion/sql/tests/common/mod.rs
Outdated
| Field::new("c_name", DataType::Utf8, false), | ||
| Field::new("c_address", DataType::Utf8, false), | ||
| Field::new("c_nationkey", DataType::UInt32, false), | ||
| Field::new("c_phone", DataType::Decimal32(15, 2), false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c_phone is utf8 --
datafusion/benchmarks/src/tpch/mod.rs
Line 86 in badaa84
| Field::new("c_phone", DataType::Utf8, false), |
datafusion/sql/tests/common/mod.rs
Outdated
| Field::new("o_custkey", DataType::UInt32, false), | ||
| Field::new("o_orderstatus", DataType::Utf8, false), | ||
| Field::new("customer_id", DataType::UInt32, false), | ||
| Field::new("o_totalprice", DataType::Decimal32(15, 2), false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal32 is not consistent with the other schemas which use Decimal 128 https://github.com/apache/datafusion/blob/main/benchmarks/src/tpch/mod.rs#L81-L110
I'll fix
|
I took the liberty of pushing a few more tests, and some small API cleanups |
|
Thanks again @duongcongtoai and @mkleen for all your help 🙏 |
Which issue does this PR close?
Cleaned-up version of #18806 with:
Removed
outer_queries_schemafrom PlannerContextPlanning logic only (optimizer modifications removed)
sql logic tests moved to sql_integration.rs
Closes Supporting planning (binding) Nested correlated subquery error with a depth exceeding 1 #19816
Rationale for this change
See #18806
What changes are included in this PR?
See #18806
Are these changes tested?
Yes
Are there any user-facing changes?
outer_queries_schemais removed from PlannerContext.